The current lecture is built loosely off of Levy and Mislevy (2016) Chapter 5, with information from Chapters 4 and 6 (although not necessarily all from the book).

Introduction to Markov Chain Monte Carlo Estimation

Bayesian analysis is all about estimating the posterior distribution

Markov Chain Monte Carlo

Commonalities Across MCMC Algorithms

To demonstrate each type of algorithm, we will use a model for a normal distribution (Chapter 4 of Levy & Mislevy, 2016) - We will investigate each, briefly - We will then switch over to JAGS to show the syntax and let JAGS work

We will conclude by talking about assessing convergence and how to report parameter estimates.

Example Data: Post-Diet Weights

Borrowing an example from (https://stats.idre.ucla.edu/spss/library/spss-libraryhow-do-i-handle-interactions-of-continuous-andcategorical-variables/)[https://stats.idre.ucla.edu/spss/library/spss-libraryhow-do-i-handle-interactions-of-continuous-andcategorical-variables/], the file DietData.csv contains data from 30 respondents who participated in a study regarding the effectiveness of three types of diets. Variables in the data set are:

  1. Respondent: Respondent number 1-30
  2. DietGroup: A 1, 2, or 3 representing the group to which a respondent was assigned
  3. HeightIN: The respondent’s height in inches
  4. WeightLB: The respondent’s weight, in pounds, recorded following the study

Here, weight is the dependent variable. The motivating research question is:

Are there differences in final weights between the three diet groups, and, if so, what are the nature of the differences?

The following syntax reads the data into R and provides some pictures of the data:

DietData = read.csv(file = "DietData.csv")

ggplot(data = DietData, aes(x = WeightLB)) + 
  geom_histogram(aes(y = ..density..), position = "identity", binwidth = 10) + 
  geom_density(alpha=.2) 

ggplot(data = DietData, aes(x = WeightLB, color = factor(DietGroup), fill = factor(DietGroup))) + 
  geom_histogram(aes(y = ..density..), position = "identity", binwidth = 10) + 
  geom_density(alpha=.2) 

ggplot(data = DietData, aes(x = HeightIN, y = WeightLB, shape = factor(DietGroup), color = factor(DietGroup))) +
  geom_smooth(method = "lm", se = FALSE) + geom_point()

Now, your turn to answer questions:

  1. What type of analysis seems most appropriate for these data?
  2. Is the dependent variable (WeightLB) is appropriate as-is for such analysis or does it need transformed?
# full analysis model suggested by data:
FullModel = lm(formula = WeightLB ~ HeightIN + factor(DietGroup) + HeightIN:factor(DietGroup), data = DietData)

# examining assumptions and leverage of fit
plot(FullModel)

# looking at ANOVA table
anova(FullModel)
## Analysis of Variance Table
## 
## Response: WeightLB
##                            Df Sum Sq Mean Sq F value    Pr(>F)    
## HeightIN                    1    684     684  10.819  0.003092 ** 
## factor(DietGroup)           2  66726   33363 527.950 < 2.2e-16 ***
## HeightIN:factor(DietGroup)  2   2186    1093  17.293 2.234e-05 ***
## Residuals                  24   1517      63                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# looking at parameter summary
summary(FullModel)
## 
## Call:
## lm(formula = WeightLB ~ HeightIN + factor(DietGroup) + HeightIN:factor(DietGroup), 
##     data = DietData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.6724  -2.7169   0.4958   1.7918  16.4581 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  170.1664    29.2786   5.812 5.42e-06 ***
## HeightIN                      -0.3768     0.4587  -0.822 0.419392    
## factor(DietGroup)2          -172.5639    41.4062  -4.168 0.000345 ***
## factor(DietGroup)3          -132.6675    38.7846  -3.421 0.002241 ** 
## HeightIN:factor(DietGroup)2    2.4727     0.6486   3.812 0.000846 ***
## HeightIN:factor(DietGroup)3    3.5666     0.6132   5.817 5.36e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.949 on 24 degrees of freedom
## Multiple R-squared:  0.9787, Adjusted R-squared:  0.9742 
## F-statistic: 220.3 on 5 and 24 DF,  p-value: < 2.2e-16

Bayesian Analysis of Our Example Data

Let’s examine how our example data should be analyzed using Bayesian methods. To do so we need to specify:

  1. Likelihood function (from the model)
  2. Prior distributions for all model parameters

Linear Models Likelihood Function (Scalar Version)

The classic linear model is one where, for a respondent \(r\) \((r = 1, \dots, N)\), a dependent variable (or outcome), \(y_r\), is predicted via a set of independent variables (or predictors), \(x_{rv}\) where \(v = 1, \dots, V\) and, sometimes, their interactive product, by means of a set of regression coefficients \(\beta_v\):

\[Y_r = \beta_0 + \beta_1 x_{r1} + \beta_2 x_{r2} + \dots + \beta_V x_{rV} + e_r = \mathbf{x}_r \boldsymbol{\beta} + e_r\]

Here, \(e_r\) is the residual or error term for respondent \(r\), with \(e_r \sim N(0, \sigma^2_e)\).

So, loosely, we can say that an observation \(Y_r \sim N( \mathbf{x}_r \boldsymbol{\beta}, \sigma^2_e)\). This now gives us the set of parameters for which we need to specify prior distributions: \(\boldsymbol{\beta}\) and \(\sigma^2_e\).

Linear Models: Conjugate Prior Distributions

Conjugate priors are prior distributions that have the same form as the posterior distribution. As we will see in another lecture, conjugate priors are helpful because they make our Bayesian sampling algorithms run efficiently. We will stay with conjugate prior distributions for this lecture.

Choosing conjugate prior distributions in Bayesian linear models has one complicating factor: The likelihood for \(\boldsymbol{\beta}\) is conditional on \(\sigma^2_e\). So, let’s start there. In particular:

\[f\left(\boldsymbol{\beta}, \sigma^2_e \right) = f\left( \boldsymbol{\beta} \mid \sigma^2_e \right)f \left( \sigma^2_e \right)\] #### Prior for \(\sigma^2_e\)

To make sense of this, we can start with \(\sigma^2_e\), which has a conjugate prior distribution that is inverse-gamma. The inverse-gamma distribution (see (https://en.wikipedia.org/wiki/Inverse-gamma_distribution)[https://en.wikipedia.org/wiki/Inverse-gamma_distribution]) is one that has support of \((0,\infty]\), which matches the range of \(\sigma^2_e\). The distribution has two parameters, \(\alpha\) (sometimes called the shape; \(\alpha>0\)) and \(\beta\) (sometimes called the scale, although in the invgamma R package, it is the rate, \(\beta>0\)).

To get a sense of how these parameters function, the mode is a good measure of central tendency, which is \(\frac{\beta}{\alpha+1}\). In practice, we can rephrase these in terms of our expected degrees of freedom total \(\nu_0\) and our expected residual variance \(\sigma^2_{e0}\):

\[\alpha_0 = \frac{\nu_0}{2}\]

\[\beta_0 = \frac{\nu_0\sigma^2_0}{2}\] In the plot below, I use the values from our classical linear model analysis to provide an example of what the prior looks like:

nu.0 = summary(FullModel)$df[2]
sigma2.0 = summary(FullModel)$sigma^2

alpha.0 = nu.0/2
beta.0 = nu.0*sigma2.0/2

PriorMode = beta.0/(alpha.0+1)

maxX = qinvgamma(.995, shape = alpha.0, rate= beta.0)
minX = 0.001
step = (maxX - minX)/10000

sigma2 = seq(minX, maxX, step)
height = dinvgamma(x = sigma2, shape = alpha.0, rate = beta.0)

plot(x = sigma2, y = height, type = "l", ylab = expression(paste("f(", sigma[e]^2, ")")), xlab = expression(sigma[e]^2), 
     main = paste0("Prior Mode:", round(PriorMode, 2), "; MLE:", round(sigma2.0, 2)))

lines(x = c(PriorMode, PriorMode), y = c(0, dinvgamma(x = PriorMode, shape = alpha.0, rate = beta.0)))

One complicating factor is that for normal distributions (where we will need to put the parameter \(\sigma^2_e\) with this prior) JAGS uses a “precision” parameter of \(\frac{1}{\sigma^2_e}\) rather than a “dispersion” of \(\sigma^2_e\). Therefore, we must specify our prior for \(\tau_e = \frac{1}{\sigma^2_e}\), which follows a gamma distribution.

The good news is that the parameters of the gamma distribution are the same, however, you have to check what they are called in R. Here is how the previous plot looks with a gamma distribution:

nu.0 = summary(FullModel)$df[2]
sigma2.0 = summary(FullModel)$sigma^2
tau.e.0 = 1/sigma2.0

alpha.0 = nu.0/2
beta.0 = nu.0*sigma2.0/2

PriorMode = (alpha.0-1)*(1/beta.0)

maxX = qgamma(.995, shape = alpha.0, rate= beta.0)
minX = 0.0001
step = .00001

tau.e = seq(minX, maxX, step)
height = dgamma(x = tau.e, shape = alpha.0, rate = beta.0)

plot(x = tau.e, y = height, type = "l", ylab = expression(paste("f(", tau[e], ")")), xlab = expression(tau[e]), 
     main = paste0("Prior Mode:", round(PriorMode, 2), "; MLE:", round(1/sigma2.0, 2)))
lines(x = c(PriorMode, PriorMode), y = c(0, dgamma(x = PriorMode, shape = alpha.0, rate = beta.0)))

“Uninformative” Choices of Conjugate Prior for \(\sigma^2_e\) (really, \(\tau_e\))

One way to make an “uninformative” but conjugate prior for \(\sigma^2_e\) is to recall the terms we used to specify the prior parameters: \(\nu_0\) as the expected DF total and \(\sigma^2_{e0}\) as the expected residual variance.

For the residual variance: 1. The maximum residual variance is the marginal variance of \(y\): occurs when \(\boldsymbol{\beta} = \mathbf{0}\) 2. Higher values of variance parameters result in higher posterior standard deviations for \(\boldsymbol{\beta}\), meaning they are more conservative

So, we’ll start with an estimate of \(\sigma^2_{e0} = \sigma^2_y\). For our sample, \(\sigma^2_y =\) 2452.14.

Next, recall that \(\nu_0\) is the expected degrees of freedom total for the analysis. As degrees of freedom decrease, the asymptotic sampling error of \(\sigma^2_e\) gets large, so you can pick a value of \(\nu_0\) that is small (but above 0). See the following plot (for \(\sigma^2_e\), but the parameter choices apply to \(\tau\)):

sigma2.0.U = var(DietData$WeightLB)
nu.0.U = 1

alpha.0.U = nu.0.U/2
beta.0.U = (nu.0.U*sigma2.0.U)/2

nu.0 = summary(FullModel)$df[2]
sigma2.0 = summary(FullModel)$sigma^2

alpha.0 = nu.0/2
beta.0 = (nu.0*sigma2.0)/2

PriorMode = beta.0/(alpha.0+1)

maxX = 3*sigma2.0.U
minX = 0.001
step = (maxX - minX)/10000

sigma2 = seq(minX, maxX, step)

height = dinvgamma(x = sigma2, shape = alpha.0, rate = beta.0)
heightU = dinvgamma(x = sigma2, shape = alpha.0.U, rate = beta.0.U)

matplot(
  x = cbind(sigma2, sigma2),
  y = cbind(height, heightU),
  type = "l",
  ylab = expression(paste("f(", sigma[e] ^ 2, ")")),
  xlab = expression(sigma[e] ^ 2),
  main = "Red is uninformative prior; black is prior based on suggestion in book"
)

matplot(
  x = cbind(sigma2, sigma2),
  y = cbind(height, heightU),
  type = "l",
  ylab = expression(paste("f(", sigma[e] ^ 2, ")")),
  xlab = expression(sigma[e] ^ 2),
  main = "Same Funciton: Close Up Near Original Prior",
  xlim = c(0, qinvgamma(.995, shape = alpha.0, rate = beta.0))
)

Prior for \(\boldsymbol{\beta}\)

Conditional on a value of of \(\sigma^2_e\), the conjugate prior for \(\boldsymbol{\beta}\) follows a multivariate normal distribution: \(\boldsymbol{\beta} \sim MVN\left(\bar{\boldsymbol{\beta}}, \boldsymbol{\Sigma}_\boldsymbol{\beta}\right)\). Often, this is expressed as a set of independent univariate priors where the mean for each \(\beta\) is given along with some type of variance. That said, the use of normal priors in JAGS uses precision instead of variance (so 1/variance).

These priors can be made to be uninformative by selecting very low values for precision (meaning very high levels of variance). Remember, 99% of the mass of a univariate normal distribution falls within +/- 3SD.

That said, the scale of the regression coefficients will not be constant, so if picking one value for a hyperparameter for the prior of each, pick a very large variance

Bayesian Linear Models Analysis with JAGS

The first step to building our Bayesian linear model analysis in JAGS is to put all of our data and prior values into an R list object. Here, we will

# specify specs for analysis
n.chains = 4

# specify values of hyperparameters for prior distributions

# for sigma2_e (tau)
sigma2.0 = var(DietData$WeightLB)
nu.0 = 1

alpha.tau.0 = nu.0/2
beta.tau.0 = (nu.0*sigma2.0)/2

# for betas:
betaMean.0 = 0
betaVariance.0 = 100000000
betaTau.0 = 1/betaVariance.0

# take data frame and create model matrix -- which gives us the columns of X:
BayesianFullModel.Matrix = model.matrix(object = FullModel)

# append with weight and then select only columns needed for model:
BayesianFullModel.Matrix = data.frame(cbind(DietData$WeightLB, BayesianFullModel.Matrix))
names(BayesianFullModel.Matrix) = c("weight", "intercept", "height", "group2", "group3", "hg2", "hg3")
BayesianFullModel.Matrix = BayesianFullModel.Matrix[c("weight", "height", "group2", "group3")]

# add data and N to list object that we will pass to JAGS
BayesianFullModel.JAGS.Data = list(N = nrow(DietData),
                                   weight = BayesianFullModel.Matrix$weight,
                                   height = BayesianFullModel.Matrix$height,
                                   group2 = BayesianFullModel.Matrix$group2,
                                   group3 = BayesianFullModel.Matrix$group3, 
                                   betaMean.0 = betaMean.0,
                                   betaTau.0 = betaTau.0, 
                                   alpha.tau.0 = alpha.tau.0,
                                   beta.tau.0 = beta.tau.0)

Next, we can create our JAGS model syntax. We will use the R2jags package, so we enclose the syntax in a function. Here, we put the names of the variables and hyperparameters we created in our previous step directly into the JAGS model syntax.

There are several things to note here:

BayesianFullModel.Syntax = "

  model{

    # prior distributions: 
    #   Note that terms on the left are model parameter names we create here
    beta.0             ~ dnorm(betaMean.0, betaTau.0) # prior for the intercept
    beta.height        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional slope of height
    beta.group2        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 2 and group 1
    beta.group3        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 3 and group 1
    beta.height.group2 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 2 (dif in slope from group 1) 
    beta.height.group3 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 3 (dif in slope from group 1) 
    tau.e              ~ dgamma(alpha.tau.0, beta.tau.0)      #prior for 1/sigma^2_e, residual precision
    
    
    # conditional distribution of the data (uses priors above)
    for (i in 1:N){
      
      # creating conditional mean to put into model statement 
      weight.hat[i] <- beta.0 + beta.height*height[i] + beta.group2*group2[i] + beta.group3*group3[i] + beta.height.group2*height[i]*group2[i] + beta.height.group3*height[i]*group3[i]
      
      # error values (for R^2)
      error[i] = weight[i]-weight.hat[i]
      
      # likelihood from model:
      weight[i] ~ dnorm(weight.hat[i], tau.e)
      
    }
    
    # created values to track throughout the Markov chain
    sigma2.e  <- 1/tau.e                  #create residual variance from 1/precision
    sigma.e   <- sqrt(sigma2.e)           #create residual SD
    
    # calculate mean difference between groups 2 and 3:
    dif.group2.group3 <- beta.group3 - beta.group2
    
    # calculate R^2 based on http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf
    variance.pred = sd(weight.hat[])*sd(weight.hat[])
    variance.error = sd(error[])*sd(error[])
    Rsquare = variance.pred/(variance.pred + variance.error)
  }

"
# the parameter names come from the model syntax in the previous step
BayesianFullModel.Parameters = c("beta.0", 
                                 "beta.height", 
                                 "beta.group2", 
                                 "beta.group3", 
                                 "beta.height.group2", 
                                 "beta.height.group3", 
                                 "dif.group2.group3",
                                 "tau.e", 
                                 "sigma2.e", 
                                 "sigma.e",
                                 "Rsquare",
                                 "deviance")

# in order to have reproducable Markov chains, we need to initialize the random number generator and seed values
# pick a number: Here is the date I created this syntax
BayesianFullModel.Seed = 06022019

#   Note: here, the random number seed cannot be the same per seed or the chains will be the same
RNGname = c("Wichmann-Hill","Marsaglia-Multicarry", "Super-Duper", "Mersenne-Twister")
if (RNGname[1] %in% c("Wichmann-Hill", "Marsaglia-Multicarry", 
  "Super-Duper", "Mersenne-Twister")) {
  RNGname[1] <- paste("base::", RNGname[1], sep = "")
}

BayesianFullModel.Init.Values <- vector("list", n.chains)
for (i in 1:n.chains) {
  BayesianFullModel.Init.Values[[i]]$.RNG.name <- RNGname[1]
  BayesianFullModel.Init.Values[[i]]$.RNG.seed <- BayesianFullModel.Seed + i
}

Next, to run JAGS, we need a few more values to save: (1) the names of the parameters we wish to have JAGS track across the chains and (2) a random number seed for reproducability. Once we have that, we can run JAGS. Below, I will use the jags() function, to which we should also provide the number of chains, number of iterations total, and number of burnin iterations.

# load two JAGS modules: GLM (which makes the analysis go more efficiently) and DIC (which gives an index of relative fit)
load.module("glm")
## module glm loaded
load.module("dic")
## module dic loaded
# Submit model to JAGS for compiling. We'll turn adaptation off to control it in the next line of syntax:
BayesianFullModel.JAGS = jags.model(file = textConnection(BayesianFullModel.Syntax), data = BayesianFullModel.JAGS.Data, n.adapt = 0, n.chains = 4, inits = BayesianFullModel.Init.Values)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 7
##    Total graph size: 259
## 
## Initializing model
# adapting the model: No adaptation needed as Gibbs Sampling is being used -- but good to try in case algorithm isn't Gibbs
adapt(object = BayesianFullModel.JAGS, n.iter = 1000)
## [1] TRUE
# Draw samples; NOTE: BURNIN IS INCUDED IN SAMPLES
BayesianFullModel.Samples = coda.samples(model = BayesianFullModel.JAGS, variable.names = BayesianFullModel.Parameters, n.iter = 10000, thin = 1)

# Remove burnin from samples:
BayesianFullModel.Posterior = window(x = BayesianFullModel.Samples, start = 5001, end = 10000)

Checking Chain Convergence

After the model runs, the next step is to assess convergence of the chains to the posterior distribution. This can be accomplished in several ways:

# assessing convergence:

# visual assessment
plot(BayesianFullModel.Posterior) 

# using convegence diagnostics:
gelman.diag(BayesianFullModel.Posterior, multivariate = FALSE) 
## Potential scale reduction factors:
## 
##                    Point est. Upper C.I.
## Rsquare                     1          1
## beta.0                      1          1
## beta.group2                 1          1
## beta.group3                 1          1
## beta.height                 1          1
## beta.height.group2          1          1
## beta.height.group3          1          1
## deviance                    1          1
## dif.group2.group3           1          1
## sigma.e                     1          1
## sigma2.e                    1          1
## tau.e                       1          1
# checking autocorrelation: very low for lag 1
autocorr.diag(BayesianFullModel.Posterior) 
##             Rsquare        beta.0  beta.group2 beta.group3  beta.height
## Lag 0   1.000000000  1.0000000000  1.000000000 1.000000000  1.000000000
## Lag 1   0.150625821 -0.0008196497 -0.003412160 0.007992317 -0.001429560
## Lag 5  -0.005211171  0.0098458295 -0.001800741 0.008898029  0.007988504
## Lag 10  0.004830756  0.0212639457 -0.000290625 0.004787050  0.021272860
## Lag 50 -0.005125523  0.0010910225  0.012951877 0.001860613  0.002353988
##        beta.height.group2 beta.height.group3      deviance
## Lag 0        1.0000000000        1.000000000  1.0000000000
## Lag 1       -0.0029447741        0.007692403  0.3234220496
## Lag 5       -0.0024587236        0.007174552  0.0002490533
## Lag 10      -0.0003975335        0.005750982  0.0089525877
## Lag 50       0.0131368090        0.002119942 -0.0044091911
##        dif.group2.group3       sigma.e      sigma2.e        tau.e
## Lag 0        1.000000000  1.0000000000  1.0000000000  1.000000000
## Lag 1        0.004809194  0.2023938463  0.2020213503  0.184900242
## Lag 5       -0.008748620 -0.0033246525 -0.0018487243 -0.007869463
## Lag 10       0.002598268  0.0035326988  0.0012221570  0.008525871
## Lag 50       0.003643658 -0.0004026842  0.0003452465 -0.001911239
summary(BayesianFullModel.Posterior)
## 
## Iterations = 5001:10000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                          Mean        SD  Naive SE Time-series SE
## Rsquare             9.671e-01  0.009067 6.412e-05      7.474e-05
## beta.0              1.708e+02 48.385970 3.421e-01      3.472e-01
## beta.group2        -1.732e+02 67.877682 4.800e-01      4.723e-01
## beta.group3        -1.333e+02 64.021723 4.527e-01      4.471e-01
## beta.height        -3.862e-01  0.757297 5.355e-03      5.431e-03
## beta.height.group2  2.482e+00  1.062998 7.517e-03      7.469e-03
## beta.height.group3  3.577e+00  1.012075 7.156e-03      6.983e-03
## deviance            2.240e+02  7.032384 4.973e-02      6.654e-02
## dif.group2.group3   3.987e+01 64.521135 4.562e-01      4.605e-01
## sigma.e             1.301e+01  1.932803 1.367e-02      1.679e-02
## sigma2.e            1.730e+02 53.436527 3.779e-01      4.640e-01
## tau.e               6.285e-03  0.001783 1.261e-05      1.512e-05
## 
## 2. Quantiles for each variable:
## 
##                          2.5%        25%        50%        75%     97.5%
## Rsquare             9.436e-01    0.96342    0.96937  9.733e-01   0.97735
## beta.0              7.393e+01  139.01303  171.02777  2.021e+02 265.44476
## beta.group2        -3.071e+02 -217.85516 -173.00514 -1.285e+02 -40.27762
## beta.group3        -2.596e+02 -175.50529 -132.88512 -9.171e+01  -5.40781
## beta.height        -1.871e+00   -0.87749   -0.38744  1.079e-01   1.12032
## beta.height.group2  3.910e-01    1.78268    2.48167  3.184e+00   4.58555
## beta.height.group3  1.562e+00    2.91743    3.57406  4.242e+00   5.57869
## deviance            2.123e+02  218.93856  223.38443  2.284e+02 239.68384
## dif.group2.group3  -8.691e+01   -2.51107   39.31963  8.169e+01 168.75507
## sigma.e             9.873e+00   11.62916   12.79293  1.413e+01  17.43055
## sigma2.e            9.748e+01  135.23742  163.65905  1.996e+02 303.82395
## tau.e               3.291e-03    0.00501    0.00611  7.394e-03   0.01026

The most common method used for assessing convergence is the Gelman-Rubin \(\hat{R}\), which compares the within- and between-chain variance of multiple chains. The coda package has a number of convergence diagnostics.

Checking Model-Data Fit: Posterior Predictive Model Checks

A commonly-used method for evaluating the fit of a Bayesian model to the data is to use posterior predicitive model checks. Posterior predictive model checking is a process that simulates data using model parameters from the converged section of the Markov chain. The Markov chain is important to use, as some parameters have dependencies in the posterior distribution.

nSimData = 1000

# create one big data object to randomly sample from:
BayesianFullModel.Posterior.All = do.call("rbind", BayesianFullModel.Posterior)
parameterNames = colnames(BayesianFullModel.Posterior.All)

# set up model matrix to use in repeated samples
BayesianFullModel.Matrix2 = BayesianFullModel.Matrix[,2:4]

# add columns for intercept and interactions
BayesianFullModel.Matrix2 = as.matrix(data.frame(intercept = matrix(data = 1, nrow = nrow(BayesianFullModel.Matrix2), ncol = 1), 
                                       BayesianFullModel.Matrix2,
                                       height.group2 = BayesianFullModel.Matrix$height*BayesianFullModel.Matrix$group2,
                                       height.group3 = BayesianFullModel.Matrix$height*BayesianFullModel.Matrix$group3))
colnames(BayesianFullModel.Matrix2)
## [1] "intercept"     "height"        "group2"        "group3"       
## [5] "height.group2" "height.group3"
parameterNames = c("beta.0", "beta.height", "beta.group2", "beta.group3", "beta.height.group2", "beta.height.group3")

draws = sample(x = 1:nrow(BayesianFullModel.Posterior.All), size = nSimData, replace = TRUE)  

distSummaries = data.frame(mean = matrix(data = NA, nrow = nSimData),
                           variance = matrix(data = NA, nrow = nSimData))

# the following syntax is put into a loop for didactic purposes (a version of apply() would be much faster)
i = 1
for (i in 1:nSimData){
  
  # Grab parameters from iteration in draws
  parameters = BayesianFullModel.Posterior.All[draws[i],]

  # Create Beta vector (which becomes mean of random draw), matching parameters with respective columns in data
  meanVec = BayesianFullModel.Matrix2 %*% matrix(data = parameters[parameterNames], ncol = 1)
  
  # draw random Y for all observations
  ppcY = rnorm(n = nrow(BayesianFullModel.Matrix2), mean = meanVec, sd = parameters["sigma.e"])
  
  # record summaryies of distribution of Y to compare to observed Y
  distSummaries[i,]$mean = mean(ppcY)
  distSummaries[i,]$variance = var(ppcY)
  
}

# examine how observed sample mean compares to values generated using simulation:
hist(distSummaries$mean, main = "PPC of Sample Mean")
lines(x = c(mean(DietData$WeightLB), mean(DietData$WeightLB)), y = c(0,250), col = 2)

# examine how observed sample mean compares to values generated using simulation:
hist(distSummaries$variance)
lines(x = c(var(DietData$WeightLB), var(DietData$WeightLB)), y = c(0,250), col = 2)

Results

Our chains look converged, our model fit looks good, now we can look at the results:

# the summary function doesn't print things out well:
summary(BayesianFullModel.Posterior)
## 
## Iterations = 5001:10000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                          Mean        SD  Naive SE Time-series SE
## Rsquare             9.671e-01  0.009067 6.412e-05      7.474e-05
## beta.0              1.708e+02 48.385970 3.421e-01      3.472e-01
## beta.group2        -1.732e+02 67.877682 4.800e-01      4.723e-01
## beta.group3        -1.333e+02 64.021723 4.527e-01      4.471e-01
## beta.height        -3.862e-01  0.757297 5.355e-03      5.431e-03
## beta.height.group2  2.482e+00  1.062998 7.517e-03      7.469e-03
## beta.height.group3  3.577e+00  1.012075 7.156e-03      6.983e-03
## deviance            2.240e+02  7.032384 4.973e-02      6.654e-02
## dif.group2.group3   3.987e+01 64.521135 4.562e-01      4.605e-01
## sigma.e             1.301e+01  1.932803 1.367e-02      1.679e-02
## sigma2.e            1.730e+02 53.436527 3.779e-01      4.640e-01
## tau.e               6.285e-03  0.001783 1.261e-05      1.512e-05
## 
## 2. Quantiles for each variable:
## 
##                          2.5%        25%        50%        75%     97.5%
## Rsquare             9.436e-01    0.96342    0.96937  9.733e-01   0.97735
## beta.0              7.393e+01  139.01303  171.02777  2.021e+02 265.44476
## beta.group2        -3.071e+02 -217.85516 -173.00514 -1.285e+02 -40.27762
## beta.group3        -2.596e+02 -175.50529 -132.88512 -9.171e+01  -5.40781
## beta.height        -1.871e+00   -0.87749   -0.38744  1.079e-01   1.12032
## beta.height.group2  3.910e-01    1.78268    2.48167  3.184e+00   4.58555
## beta.height.group3  1.562e+00    2.91743    3.57406  4.242e+00   5.57869
## deviance            2.123e+02  218.93856  223.38443  2.284e+02 239.68384
## dif.group2.group3  -8.691e+01   -2.51107   39.31963  8.169e+01 168.75507
## sigma.e             9.873e+00   11.62916   12.79293  1.413e+01  17.43055
## sigma2.e            9.748e+01  135.23742  163.65905  1.996e+02 303.82395
## tau.e               3.291e-03    0.00501    0.00611  7.394e-03   0.01026
# inspecting Bayesian Results
BayesianFullModel.Summary = summary(BayesianFullModel.Posterior)
round(BayesianFullModel.Summary$statistics, 3)
##                        Mean     SD Naive SE Time-series SE
## Rsquare               0.967  0.009    0.000          0.000
## beta.0              170.771 48.386    0.342          0.347
## beta.group2        -173.160 67.878    0.480          0.472
## beta.group3        -133.292 64.022    0.453          0.447
## beta.height          -0.386  0.757    0.005          0.005
## beta.height.group2    2.482  1.063    0.008          0.007
## beta.height.group3    3.577  1.012    0.007          0.007
## deviance            224.017  7.032    0.050          0.067
## dif.group2.group3    39.868 64.521    0.456          0.461
## sigma.e              13.011  1.933    0.014          0.017
## sigma2.e            173.025 53.437    0.378          0.464
## tau.e                 0.006  0.002    0.000          0.000

Now, let’s compare Bayesian with least squares results:

summary(FullModel)
## 
## Call:
## lm(formula = WeightLB ~ HeightIN + factor(DietGroup) + HeightIN:factor(DietGroup), 
##     data = DietData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.6724  -2.7169   0.4958   1.7918  16.4581 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  170.1664    29.2786   5.812 5.42e-06 ***
## HeightIN                      -0.3768     0.4587  -0.822 0.419392    
## factor(DietGroup)2          -172.5639    41.4062  -4.168 0.000345 ***
## factor(DietGroup)3          -132.6675    38.7846  -3.421 0.002241 ** 
## HeightIN:factor(DietGroup)2    2.4727     0.6486   3.812 0.000846 ***
## HeightIN:factor(DietGroup)3    3.5666     0.6132   5.817 5.36e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.949 on 24 degrees of freedom
## Multiple R-squared:  0.9787, Adjusted R-squared:  0.9742 
## F-statistic: 220.3 on 5 and 24 DF,  p-value: < 2.2e-16

Often, each parameter is summarized by its HPD (Highest Posterior Density), a value like a confidence interval. To make our life easy, we can use the HDInterval package which makes working with the R2jags output easy:

hdi(BayesianFullModel.Posterior)
##         Rsquare    beta.0 beta.group2 beta.group3 beta.height
## lower 0.9497963  72.73014   -306.7909 -260.431515   -1.883774
## upper 0.9787577 263.68190    -40.0764   -6.270331    1.099239
##       beta.height.group2 beta.height.group3 deviance dif.group2.group3
## lower          0.3770387           1.569665 211.2679         -92.06455
## upper          4.5577062           5.584401 237.9753         162.32140
##         sigma.e  sigma2.e       tau.e
## lower  9.576525  85.14509 0.003041983
## upper 16.892455 276.51649 0.009845272
## attr(,"credMass")
## [1] 0.95

Reasons for differences: \(\sigma^2_e\). Note that the values reported above are the mean of \(\sigma^2_e\), which will be pulled toward the tail of the distribution. We can investigate the mode of \(\sigma^2_e\), which is more analogous to the MLE (which we get in OLS). Note, the packages below take a while to install. I have commented them out for this example, but if you would like to follow along, please uncomment and provide the time for installation.

# 
# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("genefilter", version = "3.8")
# if (!requireNamespace("modeest")) install.packages("modeest")
# library(modeest)

# mlv(x = BayesianFullModel.Posterior.All[,10], method = "naive")
# mean(BayesianFullModel.Posterior.All[,10])
# 
# summary(FullModel)$sigma

Look at what happens if we fix the residual variance to that it was in the least squares analysis (now using the R2jags package to save some space)

BayesianFullModel.FixedVar = function(){
  # prior distributions: 
  #   Note that terms on the left are model parameter names we create here
  beta.0             ~ dnorm(betaMean.0, betaTau.0) # prior for the intercept
  beta.height        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional slope of height
  beta.group2        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 2 and group 1
  beta.group3        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 3 and group 1
  beta.height.group2 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 2 (dif in slope from group 1) 
  beta.height.group3 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 3 (dif in slope from group 1) 
  
  
  # conditional distribution of the data (uses priors above)
  for (i in 1:N){
    
    # creating conditional mean to put into model statement 
    weight.hat[i] <- beta.0 + beta.height*height[i] + beta.group2*group2[i] + beta.group3*group3[i] + beta.height.group2*height[i]*group2[i] + beta.height.group3*height[i]*group3[i]
    
    # error values (for R^2)
    error[i] = weight[i]-weight.hat[i]
    
    # likelihood from model:
    weight[i] ~ dnorm(weight.hat[i], 1/sampleVar)
    
  }
  
  # calculate mean difference between groups 2 and 3:
  dif.group2.group3 <- beta.group3 - beta.group2
  
  # calculate R^2 based on http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf
  variance.pred = sd(weight.hat[])*sd(weight.hat[])
  variance.error = sd(error[])*sd(error[])
  Rsquare = variance.pred/(variance.pred + variance.error)
}

# the parameter names come from the model syntax in the previous step
BayesianFullModel.FixedVar.Parameters = c("beta.0", 
                                          "beta.height", 
                                          "beta.group2", 
                                          "beta.group3", 
                                          "beta.height.group2", 
                                          "beta.height.group3", 
                                          "dif.group2.group3",
                                          "Rsquare")

# add data and N to list object that we will pass to JAGS
BayesianFullModel.FixedVar.Data = list(N = nrow(DietData),
                                       weight = BayesianFullModel.Matrix$weight,
                                       height = BayesianFullModel.Matrix$height,
                                       group2 = BayesianFullModel.Matrix$group2,
                                       group3 = BayesianFullModel.Matrix$group3, 
                                       betaMean.0 = betaMean.0,
                                       betaTau.0 = betaTau.0, 
                                       sampleVar = summary(FullModel)$sigma^2)

# pick a number: Here is the date I created this syntax
# in order to have reproducable Markov chains, we need to initialize the random number generator and seed values
# pick a number: Here is the date I created this syntax
BayesianFullModel.FixedVar.Seed = 11022019

#   Note: here, the random number seed cannot be the same per seed or the chains will be the same
RNGname = c("Wichmann-Hill","Marsaglia-Multicarry", "Super-Duper", "Mersenne-Twister")
if (RNGname[1] %in% c("Wichmann-Hill", "Marsaglia-Multicarry", 
  "Super-Duper", "Mersenne-Twister")) {
  RNGname[1] <- paste("base::", RNGname[1], sep = "")
}

BayesianFullModel.FixedVar.Init.Values <- vector("list", n.chains)
for (i in 1:n.chains) {
  BayesianFullModel.FixedVar.Init.Values[[i]]$.RNG.name <- RNGname[1]
  BayesianFullModel.FixedVar.Init.Values[[i]]$.RNG.seed <- BayesianFullModel.FixedVar.Seed + i
}



BayesianFullModelFixedVar.JAGS = jags(data = BayesianFullModel.FixedVar.Data,
                                      inits = BayesianFullModel.FixedVar.Init.Values,
                                      parameters.to.save = BayesianFullModel.FixedVar.Parameters, 
                                      model.file = BayesianFullModel.FixedVar,
                                      n.chains = 4,
                                      n.iter = 50000,
                                      n.burnin = 20000, 
                                      n.thin = 1)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 6
##    Total graph size: 256
## 
## Initializing model
BayesianFullModelFixedVar.JAGS
## Inference for Bugs model at "/var/folders/sl/n1qxb2m57pqfs1hcfqb8p4nx6fm6zn/T//Rtmp4KW8se/modelc9ce6fca8cf.txt", fit using jags,
##  4 chains, each with 50000 iterations (first 20000 discarded)
##  n.sims = 120000 iterations saved
##                     mu.vect sd.vect     2.5%      25%      50%      75%
## Rsquare               0.974   0.003    0.966    0.973    0.975    0.977
## beta.0              170.220  29.261  113.065  150.472  170.228  189.939
## beta.group2        -172.554  41.442 -253.720 -200.558 -172.523 -144.587
## beta.group3        -132.740  38.910 -208.717 -158.957 -132.881 -106.495
## beta.height          -0.378   0.458   -1.275   -0.686   -0.378   -0.068
## beta.height.group2    2.473   0.649    1.203    2.036    2.471    2.911
## beta.height.group3    3.568   0.615    2.359    3.154    3.570    3.981
## dif.group2.group3    39.814  38.793  -36.070   13.547   39.762   66.060
## deviance            209.507   3.456  204.761  206.957  208.862  211.362
##                      97.5%  Rhat  n.eff
## Rsquare              0.978 1.001 120000
## beta.0             227.439 1.001  73000
## beta.group2        -91.669 1.001 120000
## beta.group3        -56.233 1.001 120000
## beta.height          0.516 1.001  65000
## beta.height.group2   3.745 1.001 120000
## beta.height.group3   4.771 1.001 120000
## dif.group2.group3  116.034 1.001 120000
## deviance           217.932 1.001 120000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 6.0 and DIC = 215.5
## DIC is an estimate of expected predictive error (lower deviance is better).

Model Comparison

To show how to compare two models using the DIC (Deviance Information Criterion), we will again revisit the full model, only this time we will model the variance using a log-linear model, making this a location-scale model:

\[\log\left(\frac{1}{\sigma^2_e}\right) = \log(1) - \log \left( \sigma^2_e\right) = - \log \left( \sigma^2_e\right) = \gamma_0\]

Here, \(\gamma_0\) is a continuous variable that can take be real number. So, we can use a normal prior distribution on \(\gamma_0\). Note, \(\gamma_0\) is the value for the intercept of the residual precision \(\tau_e = \frac{1}{\sigma^2_e}\) whereas \(-\gamma_0\) is the value for the intercept of the variance. As such, this analysis yields a log-normal prior distribution on \(\sigma^2\), so we are essentially comparing models with different priors.

BayesianFullModel = function(){
 # prior distributions: 
    #   Note that terms on the left are model parameter names we create here
    beta.0             ~ dnorm(betaMean.0, betaTau.0) # prior for the intercept
    beta.height        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional slope of height
    beta.group2        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 2 and group 1
    beta.group3        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 3 and group 1
    beta.height.group2 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 2 (dif in slope from group 1) 
    beta.height.group3 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 3 (dif in slope from group 1) 
    tau.e              ~ dgamma(alpha.tau.0, beta.tau.0)      #prior for 1/sigma^2_e, residual precision
    
    
    # conditional distribution of the data (uses priors above)
    for (i in 1:N){
      
      # creating conditional mean to put into model statement 
      weight.hat[i] <- beta.0 + beta.height*height[i] + beta.group2*group2[i] + beta.group3*group3[i] + beta.height.group2*height[i]*group2[i] + beta.height.group3*height[i]*group3[i]
      
      # error values (for R^2)
      error[i] = weight[i]-weight.hat[i]
      
      # likelihood from model:
      weight[i] ~ dnorm(weight.hat[i], tau.e)
      
    }
    
    # created values to track throughout the Markov chain
     sigma2.e  <- 1/tau.e                  #create residual variance from 1/precision
     sigma.e   <- sqrt(sigma2.e)           #create residual SD
    
    # calculate mean difference between groups 2 and 3:
    dif.group2.group3 <- beta.group3 - beta.group2
    
    #difRPrec.group2.group3 <- gamma.group3 - gamma.group2
    
    # calculate R^2 based on http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf
     variance.pred = sd(weight.hat[])*sd(weight.hat[])
     variance.error = sd(error[])*sd(error[])
     Rsquare = variance.pred/(variance.pred + variance.error)
}

BayesianFullModel.Results = jags(data = BayesianFullModel.JAGS.Data, inits = BayesianFullModel.Init.Values, 
                                 parameters.to.save = BayesianFullModel.Parameters,
                                 model.file = BayesianFullModel, n.chains = 4, n.iter = 50000, n.burnin = 20000, n.thin = 1)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 7
##    Total graph size: 259
## 
## Initializing model
## Warning in FUN(X[[i]], ...): Failed to set trace monitor for deviance
## Monitor already exists and cannot be duplicated
BayesianFullModel2 = function(){
 # prior distributions: 
    #   Note that terms on the left are model parameter names we create here
    beta.0             ~ dnorm(betaMean.0, betaTau.0) # prior for the intercept
    beta.height        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional slope of height
    beta.group2        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 2 and group 1
    beta.group3        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 3 and group 1
    beta.height.group2 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 2 (dif in slope from group 1) 
    beta.height.group3 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 3 (dif in slope from group 1) 
    gamma.0            ~ dnorm(betaMean.0, betaTau.0) #prior for the log residual precision
    tau.e <- exp(gamma.0)
    
    
    # conditional distribution of the data (uses priors above)
    for (i in 1:N){
      
      # creating conditional mean to put into model statement 
      weight.hat[i] <- beta.0 + beta.height*height[i] + beta.group2*group2[i] + beta.group3*group3[i] + beta.height.group2*height[i]*group2[i] + beta.height.group3*height[i]*group3[i]
      
      # error values (for R^2)
      error[i] = weight[i]-weight.hat[i]
      
      # likelihood from model:
      weight[i] ~ dnorm(weight.hat[i], tau.e)
      
    }
    
    # created values to track throughout the Markov chain
     sigma2.e  <- 1/tau.e                  #create residual variance from 1/precision
     sigma.e   <- sqrt(sigma2.e)           #create residual SD
    
    # calculate mean difference between groups 2 and 3:
    dif.group2.group3 <- beta.group3 - beta.group2
    
    #difRPrec.group2.group3 <- gamma.group3 - gamma.group2
    
    # calculate R^2 based on http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf
     variance.pred = sd(weight.hat[])*sd(weight.hat[])
     variance.error = sd(error[])*sd(error[])
     Rsquare = variance.pred/(variance.pred + variance.error)
}

BayesianFullModel.Results2 = jags(data = BayesianFullModel.JAGS.Data, inits = BayesianFullModel.Init.Values, 
                                 parameters.to.save = c(BayesianFullModel.Parameters, "gamma.0"),
                                 model.file = BayesianFullModel2, n.chains = 4, n.iter = 50000, n.burnin = 20000, n.thin = 1)
## Warning in jags.model(model.file, data = data, inits = init.values,
## n.chains = n.chains, : Unused variable "alpha.tau.0" in data
## Warning in jags.model(model.file, data = data, inits = init.values,
## n.chains = n.chains, : Unused variable "beta.tau.0" in data
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 7
##    Total graph size: 258
## 
## Initializing model
## Warning in FUN(X[[i]], ...): Failed to set trace monitor for deviance
## Monitor already exists and cannot be duplicated
BayesianFullModel3 = function(){
 # prior distributions: 
    #   Note that terms on the left are model parameter names we create here
    beta.0             ~ dnorm(betaMean.0, betaTau.0) # prior for the intercept
    beta.height        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional slope of height
    beta.group2        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 2 and group 1
    beta.group3        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 3 and group 1
    beta.height.group2 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 2 (dif in slope from group 1) 
    beta.height.group3 ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of height with group 3 (dif in slope from group 1) 
    gamma.0            ~ dnorm(betaMean.0, betaTau.0) #prior for the interaction of residual precision
    gamma.group2        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 2 and group 1
    gamma.group3        ~ dnorm(betaMean.0, betaTau.0) #prior for the conditional mean difference between group 3 and group 1
    # tau.e <- exp(gamma.0)
    
    
    # conditional distribution of the data (uses priors above)
    for (i in 1:N){
      
      # creating conditional mean to put into model statement 
      weight.hat[i] <- beta.0 + beta.height*height[i] + beta.group2*group2[i] + beta.group3*group3[i] + beta.height.group2*height[i]*group2[i] + beta.height.group3*height[i]*group3[i]
      
      # error values (for R^2)
      error[i] = weight[i]-weight.hat[i]
      
      varhat[i] = gamma.0 + gamma.group2*group2[i] + gamma.group3*group3[i]
      tau.e[i] <- exp(varhat[i])
      # likelihood from model:
      weight[i] ~ dnorm(weight.hat[i], tau.e[i])
      
    }
    
  # calculate mean difference between groups 2 and 3:
    dif.group2.group3 <- beta.group3 - beta.group2
    
    difRPrec.group2.group3 <- gamma.group3 - gamma.group2
    
    # calculate R^2 based on http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf
    # variance.pred = sd(weight.hat[])*sd(weight.hat[])
    # variance.error = sd(error[])*sd(error[])
    # Rsquare = variance.pred/(variance.pred + variance.error)
}
BayesianFullModel3.Parameters = c("beta.0", 
                                          "beta.height", 
                                          "beta.group2", 
                                          "beta.group3", 
                                          "beta.height.group2", 
                                          "beta.height.group3", 
                                          "dif.group2.group3",
                                          "difRPrec.group2.group3",
                                          "gamma.0", "gamma.group2", "gamma.group3")

BayesianFullModel.Results3 = jags(data = BayesianFullModel.JAGS.Data, inits = BayesianFullModel.Init.Values, 
                                  parameters.to.save = BayesianFullModel.FixedVar.Parameters, 
                                  model.file = BayesianFullModel2, n.chains = 4, n.iter = 50000, n.burnin = 20000, n.thin = 1)
## Warning in jags.model(model.file, data = data, inits = init.values,
## n.chains = n.chains, : Unused variable "alpha.tau.0" in data

## Warning in jags.model(model.file, data = data, inits = init.values,
## n.chains = n.chains, : Unused variable "beta.tau.0" in data
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 7
##    Total graph size: 258
## 
## Initializing model
BayesianFullModel.Results
## Inference for Bugs model at "/var/folders/sl/n1qxb2m57pqfs1hcfqb8p4nx6fm6zn/T//Rtmp4KW8se/modelc9ce600f179a.txt", fit using jags,
##  4 chains, each with 50000 iterations (first 20000 discarded)
##  n.sims = 120000 iterations saved
##                     mu.vect sd.vect     2.5%      25%      50%      75%
## Rsquare               0.967   0.009    0.943    0.963    0.969    0.973
## beta.0              170.224  48.595   74.092  138.361  170.313  202.047
## beta.group2        -172.699  68.550 -308.337 -217.654 -172.683 -128.015
## beta.group3        -132.872  64.224 -260.059 -174.736 -133.152  -90.743
## beta.height          -0.378   0.761   -1.872   -0.877   -0.378    0.122
## beta.height.group2    2.474   1.074    0.338    1.774    2.475    3.179
## beta.height.group3    3.569   1.015    1.549    2.904    3.573    4.234
## dif.group2.group3    39.827  64.221  -86.971   -2.417   39.884   82.100
## sigma.e              13.013   1.934    9.889   11.645   12.789   14.131
## sigma2.e            173.083  53.611   97.792  135.616  163.561  199.678
## tau.e                 0.006   0.002    0.003    0.005    0.006    0.007
## deviance            224.001   6.999  212.273  218.987  223.328  228.274
##                      97.5%  Rhat  n.eff
## Rsquare              0.977 1.001  90000
## beta.0             265.815 1.001  65000
## beta.group2        -36.480 1.001 120000
## beta.group3         -5.012 1.001 120000
## beta.height          1.126 1.001  65000
## beta.height.group2   4.603 1.001 120000
## beta.height.group3   5.575 1.001 120000
## dif.group2.group3  166.368 1.001 120000
## sigma.e             17.414 1.001  77000
## sigma2.e           303.246 1.001  77000
## tau.e                0.010 1.001  77000
## deviance           239.478 1.001  53000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 24.5 and DIC = 248.5
## DIC is an estimate of expected predictive error (lower deviance is better).
BayesianFullModel.Results2
## Inference for Bugs model at "/var/folders/sl/n1qxb2m57pqfs1hcfqb8p4nx6fm6zn/T//Rtmp4KW8se/modelc9ce7ed6b0ba.txt", fit using jags,
##  4 chains, each with 50000 iterations (first 20000 discarded)
##  n.sims = 120000 iterations saved
##                     mu.vect sd.vect     2.5%      25%      50%      75%
## Rsquare               0.974   0.004    0.964    0.972    0.975    0.977
## beta.0              170.265  30.622  109.588  150.242  170.336  190.399
## beta.group2        -172.635  43.290 -257.984 -201.025 -172.630 -144.279
## beta.group3        -132.799  40.446 -212.324 -159.426 -132.811 -106.268
## beta.height          -0.378   0.480   -1.321   -0.694   -0.380   -0.065
## beta.height.group2    2.474   0.678    1.129    2.029    2.474    2.918
## beta.height.group3    3.569   0.639    2.304    3.151    3.568    3.990
## dif.group2.group3    39.836  40.386  -39.727   13.458   39.819   66.308
## gamma.0              -4.188   0.295   -4.805   -4.378   -4.173   -3.982
## sigma.e               8.206   1.246    6.194    7.324    8.058    8.925
## sigma2.e             68.887  21.855   38.369   53.637   64.928   79.661
## tau.e                 0.016   0.005    0.008    0.013    0.015    0.019
## deviance            210.781   4.270  204.739  207.667  210.023  213.080
##                      97.5%  Rhat  n.eff
## Rsquare              0.978 1.001  80000
## beta.0             230.384 1.001 120000
## beta.group2        -86.922 1.001 120000
## beta.group3        -52.803 1.001 120000
## beta.height          0.571 1.001 120000
## beta.height.group2   3.811 1.001 120000
## beta.height.group3   4.822 1.001 120000
## dif.group2.group3  119.644 1.001  64000
## gamma.0             -3.647 1.001  31000
## sigma.e             11.048 1.001  31000
## sigma2.e           122.065 1.001  31000
## tau.e                0.026 1.001  31000
## deviance           221.056 1.001  60000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 9.1 and DIC = 219.9
## DIC is an estimate of expected predictive error (lower deviance is better).
traceplot(BayesianFullModel.Results3)

BayesianFullModel.Results3
## Inference for Bugs model at "/var/folders/sl/n1qxb2m57pqfs1hcfqb8p4nx6fm6zn/T//Rtmp4KW8se/modelc9ce40ecc464.txt", fit using jags,
##  4 chains, each with 50000 iterations (first 20000 discarded)
##  n.sims = 120000 iterations saved
##                     mu.vect sd.vect     2.5%      25%      50%      75%
## Rsquare               0.974   0.004    0.964    0.972    0.975    0.977
## beta.0              169.995  30.606  109.346  150.012  170.080  189.984
## beta.group2        -172.394  43.234 -257.558 -200.665 -172.421 -144.149
## beta.group3        -132.373  40.515 -212.579 -158.872 -132.523 -105.840
## beta.height          -0.374   0.479   -1.324   -0.687   -0.375   -0.060
## beta.height.group2    2.470   0.677    1.129    2.028    2.470    2.914
## beta.height.group3    3.562   0.641    2.291    3.143    3.563    3.981
## dif.group2.group3    40.021  40.569  -39.798   13.374   39.890   66.578
## deviance            210.812   4.298  204.750  207.656  210.047  213.149
##                      97.5%  Rhat  n.eff
## Rsquare              0.978 1.001  25000
## beta.0             230.577 1.001 120000
## beta.group2        -86.719 1.001 120000
## beta.group3        -51.942 1.001 120000
## beta.height          0.576 1.001 120000
## beta.height.group2   3.806 1.001 120000
## beta.height.group3   4.827 1.001 120000
## dif.group2.group3  119.975 1.001 120000
## deviance           221.122 1.001  32000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 9.2 and DIC = 220.0
## DIC is an estimate of expected predictive error (lower deviance is better).

Algorithm Used

Because we changed away from a conjugate prior, we no longer use Gibbs sampling for all parameters. Instead, we now have a “RealSlicer” (a form of Metropolis-Hastings) for \(\gamma_0\):

list.samplers(BayesianFullModel.Results$model)
## $`glm::Generic`
## [1] "beta.height.group3" "beta.height.group2" "beta.group3"       
## [4] "beta.group2"        "beta.height"        "beta.0"            
## 
## $`bugs::ConjugateGamma`
## [1] "tau.e"
list.samplers(BayesianFullModel.Results2$model)
## $`glm::Generic`
## [1] "beta.height.group3" "beta.height.group2" "beta.group3"       
## [4] "beta.group2"        "beta.height"        "beta.0"            
## 
## $`base::RealSlicer`
## [1] "gamma.0"

Deviance Information Criterion

The DIC is a Bayesian version of an information criterion used to compare the relative fit of multiple models (see https://en.wikipedia.org/wiki/Deviance_information_criterion). Lower values are better.

The deviance is given by -2 times the likelihood plus a constant (that goes away in model comparisons).

\[ D = -2 \log \left( p \left(y \mid \boldsymbol{\theta} \right) \right) + C\] The DIC is given by the deviance, evaluated at the EAP estimate of all parameters, plus a penalty. The earliest form of the penalty \(p_D\) is:

\[ p_D = \bar{D} - D\left( \bar{\boldsymbol{\theta}}\right)\]

\[ DIC = D \left( \bar{\boldsymbol{\theta}} \right) + 2p_D\]

Here, the smallest value of the DIC belongs to the model with the log-linear variance prior. So, it is preferred and would be the one interpreted.